252 research outputs found

    On Incomplete XML Documents with Integrity Constraints

    Get PDF
    Abstract. We consider incomplete specifications of XML documents in the presence of schema information and integrity constraints. We show that integrity constraints such as keys and foreign keys affect consistency of such specifications. We prove that the consistency problem for incomplete specifications with keys and foreign keys can always be solved in NP. We then show a dichotomy result, classifying the complexity of the problem as NP-complete or PTIME, depending on the precise set of features used in incomplete descriptions.

    Bisimulations on data graphs

    Get PDF
    Bisimulation provides structural conditions to characterize indistinguishability from an external observer between nodes on labeled graphs. It is a fundamental notion used in many areas, such as verification, graph-structured databases, and constraint satisfaction. However, several current applications use graphs where nodes also contain data (the so called “data graphs”), and where observers can test for equality or inequality of data values (e.g., asking the attribute ‘name’ of a node to be different from that of all its neighbors). The present work constitutes a first investigation of “data aware” bisimulations on data graphs. We study the problem of computing such bisimulations, based on the observational indistinguishability for XPath —a language that extends modal logics like PDL with tests for data equality— with and without transitive closure operators. We show that in general the problem is PSPACE-complete, but identify several restrictions that yield better complexity bounds (CO- NP, PTIME) by controlling suitable parameters of the problem, namely the amount of non-locality allowed, and the class of models considered (graphs, DAGs, trees). In particular, this analysis yields a hierarchy of tractable fragments.Fil: Abriola, Sergio Alejandro. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación En Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación En Ciencias de la Computacion; ArgentinaFil: Barceló, Pablo. Universidad de Chile; ChileFil: Figueira, Diego. Centre National de la Recherche Scientifique; FranciaFil: Figueira, Santiago. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación En Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación En Ciencias de la Computacion; Argentin

    Context-Free Path Querying with Structural Representation of Result

    Full text link
    Graph data model and graph databases are very popular in various areas such as bioinformatics, semantic web, and social networks. One specific problem in the area is a path querying with constraints formulated in terms of formal grammars. The query in this approach is written as grammar, and paths querying is graph parsing with respect to given grammar. There are several solutions to it, but how to provide structural representation of query result which is practical for answer processing and debugging is still an open problem. In this paper we propose a graph parsing technique which allows one to build such representation with respect to given grammar in polynomial time and space for arbitrary context-free grammar and graph. Proposed algorithm is based on generalized LL parsing algorithm, while previous solutions are based mostly on CYK or Earley algorithms, which reduces time complexity in some cases.Comment: Evaluation extende

    Separating Automatic Relations

    Full text link
    We study the separability problem for automatic relations (i.e., relations on finite words definable by synchronous automata) in terms of recognizable relations (i.e., finite unions of products of regular languages). This problem takes as input two automatic relations RR and RR', and asks if there exists a recognizable relation SS that contains RR and does not intersect RR'. We show this problem to be undecidable when the number of products allowed in the recognizable relation is fixed. In particular, checking if there exists a recognizable relation SS with at most kk products of regular languages that separates RR from RR' is undecidable, for each fixed k2k \geq 2. Our proofs reveal tight connections, of independent interest, between the separability problem and the finite coloring problem for automatic graphs, where colors are regular languages.Comment: Long version of a paper accepted at MFCS 202

    Model Interpretability through the Lens of Computational Complexity

    Get PDF
    In spite of several claims stating that some models are more interpretable than others -- e.g., "linear models are more interpretable than deep neural networks" -- we still lack a principled notion of interpretability to formally compare among different classes of models. We make a step towards such a notion by studying whether folklore interpretability claims have a correlate in terms of computational complexity theory. We focus on local post-hoc explainability queries that, intuitively, attempt to answer why individual inputs are classified in a certain way by a given model. In a nutshell, we say that a class C1\mathcal{C}_1 of models is more interpretable than another class C2\mathcal{C}_2, if the computational complexity of answering post-hoc queries for models in C2\mathcal{C}_2 is higher than for those in C1\mathcal{C}_1. We prove that this notion provides a good theoretical counterpart to current beliefs on the interpretability of models; in particular, we show that under our definition and assuming standard complexity-theoretical assumptions (such as P\neqNP), both linear and tree-based models are strictly more interpretable than neural networks. Our complexity analysis, however, does not provide a clear-cut difference between linear and tree-based models, as we obtain different results depending on the particular post-hoc explanations considered. Finally, by applying a finer complexity analysis based on parameterized complexity, we are able to prove a theoretical result suggesting that shallow neural networks are more interpretable than deeper ones.Comment: 36 pages, including 9 pages of main text. This is the arXiv version of the NeurIPS'2020 paper. Except from minor differences that could be introduced by the publisher, the only difference should be the addition of the appendix, which contains all the proofs that do not appear in the main tex

    No Agreement Without Loss: Learning and Social Choice in Peer Review

    Full text link
    In peer review systems, reviewers are often asked to evaluate various features of submissions, such as technical quality or novelty. A score is given to each of the predefined features and based on these the reviewer has to provide an overall quantitative recommendation. However, reviewers differ in how much they value different features. It may be assumed that each reviewer has her own mapping from a set of criteria scores (score vectors) to a recommendation, and that different reviewers have different mappings in mind. Recently, Noothigattu, Shah and Procaccia introduced a novel framework for obtaining an aggregated mapping by means of Empirical Risk Minimization based on L(p,q)L(p,q) loss functions, and studied its axiomatic properties in the sense of social choice theory. We provide a body of new results about this framework. On the one hand we study a trade-off between strategy-proofness and the ability of the method to properly capture agreements of the majority of reviewers. On the other hand, we show that dropping a certain unrealistic assumption makes the previously reported results to be no longer valid. Moreover, in the general case, strategy-proofness fails dramatically in the sense that a reviewer is able to make significant changes to the solution in her favor by arbitrarily small changes to their true beliefs. In particular, no approximate version of strategy-proofness is possible in this general setting since the method is not even continuous w.r.t. the data. Finally we propose a modified aggregation algorithm which is continuous and show that it has good axiomatic properties.Comment: preprint submitted to a conferenc

    Semantic Optimization of Conjunctive Queries

    Get PDF
    This work deals with the problem of semantic optimization of the central class of conjunctive queries (CQs). Since CQ evaluation is NP-complete, a long line of research has focussed on identifying fragments of CQs that can be efficiently evaluated. One of the most general restrictions corresponds to generalized hypetreewidth bounded by a fixed constant k ≥ 1; the associated fragment is denoted GHWk. A CQ is semantically in GHWk if it is equivalent to a CQ in GHWk. The problem of checking whether a CQ is semantically in GHWk has been studied in the constraint-free case, and it has been shown to be NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (TGDs) that can express, e.g., inclusion dependencies, or equality-generating dependencies (EGDs) that capture, e.g., key dependencies, a CQ may turn out to be semantically in GHWk under the constraints, while not being semantically in GHWk without the constraints. This opens avenues to new query optimization techniques. In this article, we initiate and develop the theory of semantic optimization of CQs under constraints. More precisely, we study the following natural problem: Given a CQ and a set of constraints, is the query semantically in GHWk, for a fixed k ≥ 1, under the constraints, or, in other words, is the query equivalent to one that belongs to GHWk over all those databases that satisfy the constraints? We show that, contrary to what one might expect, decidability of CQ containment is a necessary but not a sufficient condition for the decidability of the problem in question. In particular, we show that checking whether a CQ is semantically in GHW1 is undecidable in the presence of full TGDs (i.e., Datalog rules) or EGDs. In view of the above negative results, we focus on the main classes of TGDs for which CQ containment is decidable and that do not capture the class of full TGDs, i.e., guarded, non-recursive, and sticky sets of TGDs, and show that the problem in question is decidable, while its complexity coincides with the complexity of CQ containment. We also consider key dependencies over unary and binary relations, and we show that the problem in question is decidable in elementary time. Furthermore, we investigate whether being semantically in GHWk alleviates the cost of query evaluation. Finally, in case a CQ is not semantically in GHWk, we discuss how it can be approximated via a CQ that falls in GHWk in an optimal way. Such approximations might help finding “quick” answers to the input query when exact evaluation is intractable